This report is the first of two phases in a project whose objective is to predict the target features of Envifonmental, Social, and Govnernance Rankings as provided by the investment researcher Sustainalytics.
All data are freely available online through Yahoo Finance and with the assistance of scripting in RStudio we are able to merge, wrangle, sample, and visualise the data whose features will be utilised in the calibration of the prediction model.
Six visualisations in total are presented across a scope of three, two, and one variable. Each variable set contains two plots. This is not intended to be an exhaustive list.
There are three sources of data which have been combined into a single dataframe by Stock Symbol (primary key). The sources are outlined as follows;
# Import Packages for Wrangling
import warnings
warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
import io
import requests
pd.set_option('display.max_columns', None) # so that we can see all the columns
# Read in the ESG data
df_source = "C:/Users/sbgrrtt-pc/Google Drive/__RMIT/23S1/MATH2319 - Machine Learning NEW/ML Practical/A_Phase 1/23.01.19 - ESG Data SP500 Sustainalytics.csv"
esgData = pd.read_csv(df_source,index_col='Index')
print(esgData.shape) # Define the shape of the data.
print(esgData.dtypes) # Observe all feature types as is.
# We note that ScrapDate and LastUpdateDate are listed as objects and not dates.
# This will be appropriate for our scope as the data in question is a snapshot at a point in time and not a time-series.
# We move to drop the variables we will not require at all.
esgData = esgData.drop(columns=["ScrapDate"])
esgData = esgData.drop(columns=["Link"])
SP500constituents = esgData['Symbol'].to_list() # Create list of PK items for scraping
esgData['LastUpdateDate'].unique() # Review Last Update Dates
import yfinance as yf # Importing for price data
# Define the start and end dates for the 5-year period
start_date = '2019-12-31'
end_date = '2022-12-31'
# Find adjusted close prices for each Symbol over the specified period
priceData = yf.download(SP500constituents, start=start_date, end=end_date)['Adj Close']
priceData.to_csv('SP500priceData.csv') # Save the price data to CSV
priceData = priceData.dropna(axis=1) # Drop variables without complete priceData available
priceData.head(25) # Review DF
# Calculate log returns over 1, and 3 years
log_returns_1d = np.log(priceData).diff().dropna()
log_returns_1d.shape
tdays = 252 # Estimate Trading Days per annum
totalDays = len(log_returns_1d) # Number of last trading Day
# Subset the data into 1 and 3 year chunks. Cumulative return ~ sum over time.
lnRet1yData = log_returns_1d.iloc[totalDays-(1*tdays):totalDays]
lnRet3yData = log_returns_1d.iloc[totalDays-(3*tdays):totalDays]
# Calc Standard Deviation of daily log returns per grouping
log_returns_1y = lnRet1yData.sum()
log_returns_3y = lnRet3yData.sum()
# Calculate standard deviations over 1, and 3 years
stdev_1y = lnRet1yData.std()
stdev_3y = lnRet3yData.std()
# Create a DataFrame to store the results
invPerf = pd.DataFrame({'StDev_1y': stdev_1y, 'StDev_3y': stdev_3y,'LnRet_1y':log_returns_1y,'LnRet_3y' : log_returns_3y})
SP500complete = invPerf.index.tolist() # refresh
# Create an empty DataFrame to store the data
compPerf = pd.DataFrame(columns=['Symbol', 'MarketCap', 'Beta', 'EPS'])
# Loop through each ticker and extract the necessary data
for ticker in SP500complete:
# Fetch the stock data for the ticker
stock_data = yf.Ticker(ticker)
# Extract the market capitalisation, beta, and earnings per share
market_cap = stock_data.info.get('marketCap')
beta = stock_data.info.get('beta')
eps = stock_data.info.get('trailingEps')
# Create a dictionary to store the data
compDict = {'Symbol': ticker,'MarketCap': market_cap,'Beta': beta,'EPS': eps}
# Append the dictionary to the DataFrame
compPerf = compPerf.append(compDict, ignore_index=True)
# set Symbol as index for joining to workingData
esgData.set_index('Symbol',inplace = True)
compPerf.set_index('Symbol', inplace=True)
# Save scraped data to CSV as checkpoint
compPerf.to_csv('compPerf.csv')
invPerf.to_csv('invPerf.csv')
(501, 12) Symbol object Company object Sector object Industry object Link object ScrapDate object TotalESGScore float64 EnvironmentRiskScore float64 SocialRiskScore float64 GovernanceRiskScore float64 ControversyLevel float64 LastUpdateDate object dtype: object [*********************100%***********************] 501 of 501 completed 2 Failed downloads: - SIVB: No data found, symbol may be delisted - FRC: Data doesn't exist for startDate = 1577768400, endDate = 1672462800 Company 0 Sector 0 Industry 0 TotalESGScore 62 EnvironmentRiskScore 62 SocialRiskScore 62 GovernanceRiskScore 62 ControversyLevel 62 LastUpdateDate 62 MarketCap 8 Beta 28 EPS 10 StDev_1y 8 StDev_3y 8 LnRet_1y 8 LnRet_3y 8 dtype: int64 Company 0 Sector 0 Industry 0 TotalESGScore 0 EnvironmentRiskScore 0 SocialRiskScore 0 GovernanceRiskScore 0 ControversyLevel 0 LastUpdateDate 0 MarketCap 0 Beta 0 EPS 0 StDev_1y 0 StDev_3y 0 LnRet_1y 0 LnRet_3y 0 dtype: int64
# Join dataframes (esgData, compPerf, invPerf) to combine for one
workingData = esgData.join(compPerf, how = 'left')
workingData = workingData.join(invPerf,how = 'left')
print(workingData.isnull().sum()) # review whats missing at a high level
len(workingData) # 501 rows in index
workingData = workingData.dropna() # reduces from 503 to ~ 421 entries of complete data, ~15% missing
workingData = workingData.drop(columns='Company') # Remove individual names
# workingData = workingData.reset_index(drop=True, inplace=True) # remove ticker from index
print(workingData.isnull().sum()) # review whats missing at a high level again
sampleSize = len(workingData)
print("Sample size for the tuples with sufficient data is "+str(sampleSize))
Company 0 Sector 0 Industry 0 TotalESGScore 62 EnvironmentRiskScore 62 SocialRiskScore 62 GovernanceRiskScore 62 ControversyLevel 62 LastUpdateDate 62 MarketCap 8 Beta 28 EPS 10 StDev_1y 8 StDev_3y 8 LnRet_1y 8 LnRet_3y 8 dtype: int64 Sector 0 Industry 0 TotalESGScore 0 EnvironmentRiskScore 0 SocialRiskScore 0 GovernanceRiskScore 0 ControversyLevel 0 LastUpdateDate 0 MarketCap 0 Beta 0 EPS 0 StDev_1y 0 StDev_3y 0 LnRet_1y 0 LnRet_3y 0 dtype: int64 Sample size for the tuples with sufficient data is 421
# Fix up data types
workingData['MarketCap'] = workingData['MarketCap'].astype(float)
workingData['Beta'] = workingData['Beta'].astype(float)
workingData['EPS'] = workingData['EPS'].astype(float)
workingData['ControversyLevel'] = workingData['ControversyLevel'].astype(str)
# SAVE TO CSV AS A CHECKPOINT
workingData.to_csv('workingDataCleaned.csv')
workingData.sample(25) # Review sample of DF
| Sector | Industry | TotalESGScore | EnvironmentRiskScore | SocialRiskScore | GovernanceRiskScore | ControversyLevel | LastUpdateDate | MarketCap | Beta | EPS | StDev_1y | StDev_3y | LnRet_1y | LnRet_3y | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Symbol | |||||||||||||||
| CRM | Information Technology | Application Software | 13.0 | 2.3 | 6.4 | 4.5 | 2.0 | Aug-22 | 2.059473e+11 | 1.243842 | 0.21 | 0.029686 | 0.027596 | -0.655295 | -0.204278 |
| VMC | Materials | Construction Materials | 29.0 | 15.6 | 4.6 | 8.9 | 2.0 | Aug-22 | 2.624190e+10 | 0.743138 | 4.67 | 0.019513 | 0.024519 | -0.157319 | 0.223908 |
| GWW | Industrials | Industrial Machinery | 15.0 | 3.4 | 5.5 | 5.7 | 1.0 | Aug-22 | 3.420286e+10 | 1.191766 | 32.59 | 0.017887 | 0.020330 | 0.086911 | 0.542710 |
| DE | Industrials | Agricultural & Farm Machinery | 17.0 | 3.2 | 6.9 | 6.6 | 2.0 | Aug-22 | 1.077279e+11 | 1.037297 | 26.39 | 0.022361 | 0.023664 | 0.238272 | 0.946467 |
| FIS | Information Technology | Data Processing & Outsourced Services | 18.0 | 1.3 | 11.1 | 5.7 | 1.0 | Aug-22 | 3.367412e+10 | 0.832160 | -27.56 | 0.030482 | 0.026053 | -0.461202 | -0.673980 |
| TT | Industrials | Building Products | 16.0 | 7.1 | 3.8 | 5.2 | 2.0 | Aug-22 | 3.951685e+10 | 1.180154 | 7.80 | 0.019284 | 0.021483 | -0.160537 | 0.537728 |
| CDNS | Information Technology | Application Software | 12.0 | 1.4 | 6.1 | 4.8 | 0.0 | Aug-22 | 5.909335e+10 | 1.068261 | 3.18 | 0.025643 | 0.025444 | -0.158605 | 0.839855 |
| BDX | Health Care | Health Care Equipment | 27.0 | 3.7 | 15.6 | 7.5 | 3.0 | Aug-22 | 7.153770e+10 | 0.564743 | 5.44 | 0.015262 | 0.017328 | 0.042520 | -0.001725 |
| VFC | Consumer Discretionary | Apparel, Accessories & Luxury Goods | 13.0 | 0.9 | 6.8 | 5.1 | 2.0 | Aug-22 | 7.411689e+09 | 1.484166 | 1.07 | 0.027905 | 0.028636 | -0.925404 | -1.182015 |
| C | Financials | Diversified Banks | 27.0 | 1.8 | 13.1 | 11.8 | 4.0 | Aug-22 | 8.898594e+10 | 1.563335 | 7.28 | 0.020801 | 0.028997 | -0.250313 | -0.459838 |
| UNP | Industrials | Railroads | 25.0 | 9.4 | 12.1 | 3.7 | 2.0 | Aug-22 | 1.213110e+11 | 1.104855 | 11.30 | 0.016526 | 0.019575 | -0.164033 | 0.198410 |
| NTRS | Financials | Asset Management & Custody Banks | 24.0 | 1.9 | 11.8 | 10.7 | 2.0 | Aug-22 | 1.504229e+10 | 1.084095 | 5.78 | 0.021734 | 0.024666 | -0.270360 | -0.094560 |
| JNJ | Health Care | Pharmaceuticals | 25.0 | 0.7 | 15.8 | 8.6 | 4.0 | Aug-22 | 4.129642e+11 | 0.534482 | 4.76 | 0.010927 | 0.013715 | 0.050809 | 0.269101 |
| ALB | Materials | Specialty Chemicals | 29.0 | 15.5 | 5.5 | 7.7 | 2.0 | Aug-22 | 2.395414e+10 | 1.535890 | 30.32 | 0.035057 | 0.034295 | -0.071055 | 1.119612 |
| GE | Industrials | Industrial Conglomerates | 41.0 | 14.1 | 15.0 | 11.6 | 3.0 | Aug-22 | 1.136726e+11 | 1.205137 | 7.27 | 0.022120 | 0.028856 | -0.118103 | -0.050954 |
| IP | Materials | Paper Packaging | 24.0 | 14.3 | 4.5 | 5.1 | 3.0 | Aug-22 | 1.109194e+10 | 1.021463 | 4.52 | 0.018580 | 0.023410 | -0.255541 | -0.098055 |
| O | Real Estate | Retail REITs | 14.0 | 4.0 | 5.5 | 4.8 | 0.0 | Aug-22 | 4.037313e+10 | 0.802955 | 1.42 | 0.014009 | 0.024415 | -0.077590 | 0.014969 |
| DAL | Industrials | Airlines | 26.0 | 9.5 | 11.5 | 5.2 | 2.0 | Aug-22 | 2.306709e+10 | 1.249824 | 2.94 | 0.029058 | 0.036677 | -0.172331 | -0.569542 |
| WHR | Consumer Discretionary | Household Appliances | 16.0 | 4.8 | 7.1 | 4.0 | 2.0 | Aug-22 | 7.196830e+09 | 1.489795 | -35.22 | 0.024665 | 0.028517 | -0.455638 | 0.057785 |
| LLY | Health Care | Pharmaceuticals | 32.0 | 3.4 | 17.4 | 11.6 | 2.0 | Aug-22 | 4.199394e+11 | 0.366995 | 6.33 | 0.017065 | 0.021377 | 0.290851 | 1.072193 |
| CCI | Real Estate | Specialized REITs | 13.0 | 5.2 | 2.9 | 5.1 | 0.0 | Aug-22 | 4.903952e+10 | 0.661857 | 3.83 | 0.019230 | 0.020459 | -0.390170 | 0.050614 |
| CMI | Industrials | Industrial Machinery | 19.0 | 5.8 | 7.5 | 6.1 | 2.0 | Aug-22 | 3.067365e+10 | 0.999296 | 17.75 | 0.017708 | 0.020567 | 0.133158 | 0.381079 |
| ATVI | Communication Services | Interactive Home Entertainment | 19.0 | 0.2 | 12.5 | 6.3 | 4.0 | Aug-22 | 6.178423e+10 | 0.416256 | 2.35 | 0.017374 | 0.020824 | 0.131891 | 0.270625 |
| KMX | Consumer Discretionary | Automotive Retail | 12.0 | 0.1 | 7.9 | 4.0 | 1.0 | Aug-22 | 1.134317e+10 | 1.425756 | 2.94 | 0.033689 | 0.031607 | -0.744445 | -0.364511 |
| WY | Real Estate | Specialized REITs | 18.0 | 10.5 | 3.3 | 3.8 | 2.0 | Aug-22 | 2.166864e+10 | 1.491907 | 1.71 | 0.019511 | 0.029580 | -0.223569 | 0.132287 |
# Sector breakdown
print( len( (workingData[ 'Sector' ].unique() ) ) ) # HOW MANY SECTORS ARE THERE? 11
print(workingData['Sector' ].unique()) # WHAT ARE THE SECTORS?.
11 ['Health Care' 'Industrials' 'Consumer Discretionary' 'Information Technology' 'Financials' 'Consumer Staples' 'Utilities' 'Materials' 'Real Estate' 'Communication Services' 'Energy']
# Industry breakdown
print( len( (workingData[ 'Industry' ].unique() ) ) ) # 118 INDUSTRY clasifiers?
print( workingData[ 'Industry' ].unique() ) # LABELS OF INDUSTRY
118 ['Health Care Equipment' 'Airlines' 'Automotive Retail' 'Technology Hardware, Storage & Peripherals' 'Pharmaceuticals' 'Health Care Distributors' 'Reinsurance' 'IT Consulting & Other Services' 'Application Software' 'Semiconductors' 'Agricultural Products' 'Data Processing & Outsourced Services' 'Multi-Utilities' 'Independent Power Producers & Energy Traders' 'Life & Health Insurance' 'Property & Casualty Insurance' 'Multi-line Insurance' 'Insurance Brokers' 'Internet Services & Infrastructure' 'Specialty Chemicals' 'Building Products' 'Electrical Components & Equipment' 'Asset Management & Custody Banks' 'Specialized REITs' 'Internet & Direct Marketing Retail' 'Communications Equipment' 'Industrial Gases' 'Electronic Components' 'Auto Parts & Equipment' 'Office REITs' 'Gas Utilities' 'Interactive Home Entertainment' 'Residential REITs' 'Paper Packaging' 'Water Utilities' 'Consumer Finance' 'Specialty Stores' 'Aerospace & Defense' 'Diversified Banks' 'Metal & Glass Containers' 'Computer & Electronics Retail' 'Distillers & Vintners' 'Biotechnology' 'Life Sciences Tools & Services' 'Multi-Sector Holdings' 'Packaged Foods & Meats' 'Financial Exchanges & Data' 'Real Estate Services' 'Hotels, Resorts & Cruise Lines' 'Technology Distributors' 'Fertilizers & Agricultural Chemicals' 'Regional Banks' 'Household Products' 'Air Freight & Logistics' 'Cable & Satellite' 'Managed Health Care' 'Restaurants' 'Industrial Machinery' 'Health Care Supplies' 'Oil & Gas Exploration & Production' 'Hypermarkets & Super Centers' 'Diversified Support Services' 'Research & Consulting Services' 'Railroads' 'Health Care Services' 'Integrated Oil & Gas' 'Electric Utilities' 'Agricultural & Farm Machinery' 'General Merchandise Stores' 'Homebuilding' 'Movies & Entertainment' 'Health Care Facilities' 'Personal Products' 'Diversified Chemicals' 'Automobile Manufacturers' 'Copper' 'Retail REITs' 'Systems Software' 'Industrial Conglomerates' 'Interactive Media & Services' 'Consumer Electronics' 'Investment Banking & Brokerage' 'Oil & Gas Equipment & Services' 'Leisure Products' 'Hotel & Resort REITs' 'Advertising' 'Construction & Engineering' 'Trucking' 'Semiconductor Equipment' 'Oil & Gas Storage & Transportation' 'Soft Drinks' 'Food Retail' 'Distributors' 'Home Improvement Retail' 'Alternative Carriers' 'Casinos & Gaming' 'Home Furnishings' 'Construction Materials' 'Tobacco' 'Oil & Gas Refining & Marketing' 'Gold' 'Apparel, Accessories & Luxury Goods' 'Steel' 'Housewares & Specialties' 'Construction Machinery & Heavy Trucks' 'Health Care REITs' 'Industrial REITs' 'Human Resource & Employment Services' 'Environmental & Facilities Services' 'Electronic Equipment & Instruments' 'Food Distributors' 'Integrated Telecommunication Services' 'Brewers' 'Electronic Manufacturing Services' 'Wireless Telecommunication Services' 'Trading Companies & Distributors' 'Drug Retail' 'Household Appliances']
print(workingData.dtypes) # Review data type for each variable
Sector object Industry object TotalESGScore float64 EnvironmentRiskScore float64 SocialRiskScore float64 GovernanceRiskScore float64 ControversyLevel object LastUpdateDate object MarketCap float64 Beta float64 EPS float64 StDev_1y float64 StDev_3y float64 LnRet_1y float64 LnRet_3y float64 LogMarketCap float64 dtype: object
print(workingData.isnull().sum()) # Review missing columns
Sector 0 Industry 0 TotalESGScore 0 EnvironmentRiskScore 0 SocialRiskScore 0 GovernanceRiskScore 0 ControversyLevel 0 LastUpdateDate 0 MarketCap 0 Beta 0 EPS 0 StDev_1y 0 StDev_3y 0 LnRet_1y 0 LnRet_3y 0 LogMarketCap 0 dtype: int64
The dataset contains a mix of variable types covering financial and non-financial measurements as outlined in the table below.
from tabulate import tabulate
table =[['Variable','Data Split','Data Type','Description'],
['Symbol','Qualitative','Nominal','Unique Stock Identifier'],
['Company','Qualitative','Nominal','Stock Name'],
['Sector','Qualitative','Categorical','The highest grouping in the GICS* market segmentation methodology.'],
['Industry','Qualitative','Categorical','The second highest grouping in the GICS* market segmentation methodology.'],
['TotalESGScore','Quantitative','Derived','Sum of Environmental, Social, Government Scores. **'],
['EnvironmentRiskScore','Quantitative','Discrete','Measurement regarding the Environmental Risks each company is specifically exposed to.'],
['SocialRiskScore','Quantitative','Discrete','Measurement regarding the Social Risks each company is specifically exposed to.'],
['GovernanceRiskScore','Quantitative','Discrete','Measurement regarding the Governance Risks each company is specifically exposed to.'],
['ControversyLevel','Qualitative','Categorical','Measurement regarding the Controversy each company is specifically exposed to. \
\n Scores range from 0 (negligible) to 5 (severe)'],
['LastUpdateDate','Quantitative','Date','Date Score Provided to Company'],
['MarketCap','Quantitative','Discrete','The value of outstanding common shares owned by stockholders of a publicly traded company \
\n Market capitalization is derived as a product of the market price per common share and the number of common shares outstanding.'],
['Beta','Quantitative','Discrete','A measurement that shows how value moves with regards to a relative measure for an individual asset,\
commonly an index the stock exists within. Higher than 1 indicates a higher relative movement for the asset compared to its benchmark, lower than 1 indicates the opposite.'],
['EPS','Quantitative','Discrete','Earnings per share (EPS) indicates the overall profitability of a company on a per share basis. \
\n The measurement is derived by dividing the profit over the outstanding common stock shares for a company.\
\n The profit may be adjusted to account for less regular transactions that would otherwise skew the figure and not commonly represent the businesses primary activity.\
\n A relatively high EPS figure indicates a relatively more profitable company when contrasting against either peers at the same point in time or ones prior results.'],
['StDev_1y','Quantitative','Discrete','Standard Deviation of daily log Return for 1y'],
['StDev_3y','Quantitative','Discrete','Standard Deviation of daily log Return for 3y'],
['LnRet_1y','Quantitative','Discrete','Sum of daily log returns for 1y'],
['LnRet_3y','Quantitative','Discrete','Sum of daily log returns for 3y']]
print(tabulate(table, headers='firstrow', tablefmt='fancy_grid'))
╒══════════════════════╤══════════════╤═════════════╤═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╕ │ Variable │ Data Split │ Data Type │ Description │ ╞══════════════════════╪══════════════╪═════════════╪═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡ │ Symbol │ Qualitative │ Nominal │ Unique Stock Identifier │ ├──────────────────────┼──────────────┼─────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ Company │ Qualitative │ Nominal │ Stock Name │ ├──────────────────────┼──────────────┼─────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ Sector │ Qualitative │ Categorical │ The highest grouping in the GICS* market segmentation methodology. │ ├──────────────────────┼──────────────┼─────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ Industry │ Qualitative │ Categorical │ The second highest grouping in the GICS* market segmentation methodology. │ ├──────────────────────┼──────────────┼─────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ TotalESGScore │ Quantitative │ Derived │ Sum of Environmental, Social, Government Scores. ** │ ├──────────────────────┼──────────────┼─────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ EnvironmentRiskScore │ Quantitative │ Discrete │ Measurement regarding the Environmental Risks each company is specifically exposed to. │ ├──────────────────────┼──────────────┼─────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ SocialRiskScore │ Quantitative │ Discrete │ Measurement regarding the Social Risks each company is specifically exposed to. │ ├──────────────────────┼──────────────┼─────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ GovernanceRiskScore │ Quantitative │ Discrete │ Measurement regarding the Governance Risks each company is specifically exposed to. │ ├──────────────────────┼──────────────┼─────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ ControversyLevel │ Qualitative │ Categorical │ Measurement regarding the Controversy each company is specifically exposed to. │ │ │ │ │ Scores range from 0 (negligible) to 5 (severe) │ ├──────────────────────┼──────────────┼─────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ LastUpdateDate │ Quantitative │ Date │ Date Score Provided to Company │ ├──────────────────────┼──────────────┼─────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ MarketCap │ Quantitative │ Discrete │ The value of outstanding common shares owned by stockholders of a publicly traded company │ │ │ │ │ Market capitalization is derived as a product of the market price per common share and the number of common shares outstanding. │ ├──────────────────────┼──────────────┼─────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ Beta │ Quantitative │ Discrete │ A measurement that shows how value moves with regards to a relative measure for an individual asset,commonly an index the stock exists within. Higher than 1 indicates a higher relative movement for the asset compared to its benchmark, lower than 1 indicates the opposite. │ ├──────────────────────┼──────────────┼─────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ EPS │ Quantitative │ Discrete │ Earnings per share (EPS) indicates the overall profitability of a company on a per share basis. │ │ │ │ │ The measurement is derived by dividing the profit over the outstanding common stock shares for a company. │ │ │ │ │ The profit may be adjusted to account for less regular transactions that would otherwise skew the figure and not commonly represent the businesses primary activity. │ │ │ │ │ A relatively high EPS figure indicates a relatively more profitable company when contrasting against either peers at the same point in time or ones prior results. │ ├──────────────────────┼──────────────┼─────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ StDev_1y │ Quantitative │ Discrete │ Standard Deviation of daily log Return for 1y │ ├──────────────────────┼──────────────┼─────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ StDev_3y │ Quantitative │ Discrete │ Standard Deviation of daily log Return for 3y │ ├──────────────────────┼──────────────┼─────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ LnRet_1y │ Quantitative │ Discrete │ Sum of daily log returns for 1y │ ├──────────────────────┼──────────────┼─────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ LnRet_3y │ Quantitative │ Discrete │ Sum of daily log returns for 3y │ ╘══════════════════════╧══════════════╧═════════════╧═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╛
*The Global Industry Classification Standard (GICS®) exists as an annually updated industry analysis framework to assist investors domicile business by their key business activities. It has been managed by MSCI and S&P Dow Jones Indices with a goal to define consistent and exhaustive industry definitions. GICS framework consistes of a hierarchical industry classification system with 4 main levels; Sectors, Industry Groups, Industries and Sub-Industries. The scope of this analysis is set to the two highest levels (Sector & Industry). All definitions are standardized and applied to companies globally. Each company is assigned a single GICS® classification in each of the four tiers, according to its principal business activity which in turn is predominately defined by sources of revenue.
**Sustainalytics’ ESG Risk Ratings assess the degree to which a company’s enterprise business value is at risk driven by environmental, social and governance issues. The rating employs a two-dimensional framework that combines an assessment of a company’s exposure to industry-specific material ESG issues with an assessment of how well the company is managing those issues. The final ESG Risk Ratings scores are a measure of unmanaged risk on an absolute scale of 0-100, with a lower score signaling less unmanaged ESG Risk. The ESG Risk Ratings are categorized across five risk levels: negligible (0-10), low (10-20), medium (20-30), high (30-40) and severe (40+).
Below we observe the breakdown across the quantitative information.
print(workingData.describe().round(3)) # Print and review quantitative information
TotalESGScore EnvironmentRiskScore SocialRiskScore \
count 421.000 421.000 421.000
mean 21.553 5.649 9.128
std 7.059 5.224 3.782
min 7.000 0.000 1.100
25% 16.000 1.500 6.600
50% 21.000 3.700 8.900
75% 26.000 8.900 11.600
max 46.000 23.900 21.000
GovernanceRiskScore MarketCap Beta EPS StDev_1y \
count 421.000 4.210000e+02 421.000 421.000 421.000
mean 6.779 8.065465e+10 1.032 8.160 0.022
std 2.221 2.164240e+11 0.403 25.538 0.006
min 3.100 8.810063e+06 0.163 -35.220 0.011
25% 5.300 1.684141e+10 0.753 2.300 0.017
50% 6.200 3.238837e+10 1.032 4.760 0.021
75% 7.800 6.585069e+10 1.280 8.460 0.025
max 15.500 2.755039e+12 2.490 480.420 0.049
StDev_3y LnRet_1y LnRet_3y LogMarketCap
count 421.000 421.000 421.000 421.000
mean 0.025 -0.131 0.203 10.555
std 0.007 0.294 0.388 0.495
min 0.013 -1.063 -1.830 6.945
25% 0.020 -0.309 0.003 10.226
50% 0.023 -0.118 0.225 10.510
75% 0.028 0.055 0.416 10.819
max 0.059 0.664 1.485 12.440
The target features of this project will be the Environmental, Social, and Governance ratings of the companies within the Index as additive parts towards a Total score.
Listed companies are increasingly adopting Environmental, Social, and Governance risk measurement in a drive for enhanced transparency surrounding non-financial risk for the benefit of stakeholders and industry at large.
Non-financial risk measurement is conducted by different specialist firms who differentiate their product by employing specific and proprietary scoring mechanisms for commonly listed companies.
The scope of this analysis will be domiciled to the ESG conducted by Sustainalytics (Research Provider) across the S&P 500 Index (Index). Investment performance metrics have been aggregated using built in computational packages utilisting RStudio across the index.
The primary objective here-in is to exlpore how combinations of investment performance and non-financial risk measurement metrics across the Index predict the possible ESG scores for companies by Sector for this specific research provider.
A secondary objective for this investigation is to better understand what predictor values may add value to the accuracy of predictions where we are to extrapolate estimates across a broader universe of listed companies.
# Load in Data Visualisation packages
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.colors as mcolors
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
plt.style.use("seaborn")
# Set colour scheme for Controversy Levels
num_steps = len(workingData['ControversyLevel'].unique())
trafficLights = sns.color_palette("RdYlGn", num_steps)[::-1]
0.8403193612774451
# one-variable plots
# WE HAVE ~400 OF 501 DATAPOINTS FOR THE INDEX, WHAT IS THE BREAKDOWN PER INDUSTRY OF THOSE WITH DATA ?
# p1 - histogram of Industry Freq
p1Data = workingData[['Sector', 'Industry']].groupby(['Sector']).count().reset_index(drop=False)
# Define vectors for chart
p1Sector = p1Data['Sector']
p1IndustryN = p1Data['Industry']
# Plot histogram
plt.figure(figsize = (15, 10))
plt.bar(p1Sector, p1IndustryN)
# Add Hist Labels
plt.xlabel('Sector')
plt.ylabel('Industry Count w/Data')
plt.title('Figure 1: By Sector, what is the breakdown of companies we will review? (n~'+str(sampleSize)+')')
plt.xticks(rotation=90) # Rotating x-axis labels for better readability
plt.show()# Displaying the histogram
# WE HAVE ~80 OF 501 DATAPOINTS MISSING FOR THE INDEX, WHAT IS THE BREAKDOWN PER INDUSTRY OF THOSE?
# Create group by summaries of the working and starting data grouped by Sector for gap analysis
p2Data = esgData[['Sector','Industry']].groupby(['Sector']).count()
p2Data = p2Data.reset_index(drop=False)
p2Data = p2Data.rename(columns={'Industry': 'IndCountALL'})
p2Data['IndCountComplete'] = p1IndustryN
p2Data['IndCountMissing'] = p2Data['IndCountALL'] - p2Data['IndCountComplete']
p2Data['MissingPct'] = p2Data['IndCountMissing']/p2Data['IndCountALL']
p2Data['MissingPct'] = p2Data['MissingPct'].round(2)
missingN = p2Data['IndCountMissing'].sum()
plt.figure(figsize=(15, 10))
plt.bar(p2Data['Sector'],100*p2Data['MissingPct'])
plt.xlabel('Sector')
plt.ylabel('Pct Missing (%)')
plt.title('Figure 2: How much data is missing from each Sector? (n~'+str(missingN)+")")
plt.xticks(rotation=90) # Rotating x-axis labels for better readability
plt.show()# Displaying the histogram
# We note that this Communications Services may be harder to impute as just over half of their index participants do not have data, and there is a very small sample (n~12) for those who do.
# p3 - boxplit of total esg score distribution
plt.figure(figsize = (15,8))
sns.boxplot(x=workingData['Sector'],y=workingData['TotalESGScore']);
# Add labels# Add labels
plt.title('Figure 3: How are Total ESG Scores distributed by Sector? (n~'+str(sampleSize)+')', fontsize = 15)
plt.xticks(rotation=90) # Rotating x-axis labels for better readability
# plt.
plt.show();
# p4 - rotated boxplot of market cap by sector
plt.figure(figsize = (15,8))
# Transforming the MarketCap column to logarithmic scale
workingData['LogMarketCap'] = np.log10(workingData['MarketCap'])
sns.boxplot(y=workingData['Sector'], x=workingData['LogMarketCap']); # need to find a way to transform MktCap
# Add labels
plt.title('Figure 4: What is the Distribution of Market Capitalisation by Sector? (n~'+str(sampleSize)+')', fontsize = 15)
plt.grid(True)
plt.show();
# Note that while this is skewed by outliers the poinnt remains the same with the grouping and frequency of outliers per sector.
# Materials is the only sector without outliers on the top-side (big fish)
# p5 - Is Bigger Badder - what is the relationship between Market Capitalisation and Controversy by Sector?
MarketCapMil = workingData['MarketCap']/1000000
plt.rcParams['axes.facecolor'] = 'white'
# Plot Figure
plt.figure(figsize = (15,12))
sns.stripplot(y = MarketCapMil, x = workingData['Sector'], hue = workingData['ControversyLevel'], jitter=True,
dodge=True, palette = trafficLights, size=15 )
# Add labels
plt.title('Figure 5: Are bigger comanies better, badder, or a bit of both? n~'+str(sampleSize)+')')
plt.legend(loc = 'upper right')
plt.ylabel(" Market Cap $T")
plt.xticks(rotation=90)
plt.show();
# p6 - look at the dipserson of Beta risk as facted by controversy level
# Define plot
plt.figure(figsize = (20,15))
sns.boxplot( data = workingData, x = 'Beta', y= 'Sector', hue = 'ControversyLevel', palette = trafficLights)
# Add labels
plt.title('Figure 6: How is Beta distributed by Sector by Controversy? n~'+str(sampleSize)+')', fontsize = 15);
plt.legend(loc = 'upper right')
plt.show();
The objective at hand is to predict the target features of Envifonmental, Social, and Govnernance Rankings.
We are graced with data whose strucural integrity requires limited wrangling. The data are indexed with a Primary Key (Stock Symbol) and afford a scalable structural integrity should additional providers be incorporated for added breadth.
The data we have on the Index allows technicians to manage their expectations with a reasonable effectiveness at the start of their analysis with roughly 15% of the initial scope screened out.
We may be mindful of Sectors whose sample sizes are quite low as we seek further quantiative interrogration.
In order to better understand correlation from causeation in the calibration of our model we will proceed to review the inter-sector commonalities of both Financial and Non-Financial risk measurements in Phase 2.
At a high level we now know that there is still more to discover when considering the relationship between financial and non-financial risk measurements.
Aroussi, R. (n.d.). yfinance: Yahoo! Finance market data downloader. [online] PyPI. Available at: https://pypi.org/project/yfinance/.
MSCI (2023). GICS - Global Industry Classification Standard. [online] www.msci.com. Available at: https://www.msci.com/our-solutions/indexes/gics.
Rudden, K. (2019). Scraping ESG Scores. [online] www.kylerudden.com. Available at: https://www.kylerudden.com/blog/scraping-esg-scores/ [Accessed 9 May 2023].
Sullivan, K., Bujno, M., Sullivan, K. and Bujno, M. (2023). Emerging trends in ESG governance for 2023. [online] The Harvard Law School Forum on Corporate Governance. Available at: https://corpgov.law.harvard.edu/2023/03/26/emerging-trends-in-esg-governance-for-2023/ [Accessed 10 Apr. 2023].
Sustainalytics (2021). ESG Risk Ratings - Methodology Abstract Version 2.1. [online] Available at: https://www.sustainalytics.com/docs/knowledgehublibraries/default-document-library/sustainalytics_esg-risk-rating_methodology-abstract-(2).pdf?sfvrsn=15b2cba1_0 [Accessed 18 Apr. 2023].
Sustainalytics (n.d.). Overview of Sustainalytics’ ESG Risk Ratings. [online] Available at: https://connect.sustainalytics.com/hubfs/SFS/Sustainalytics%20ESG%20Risk%20Ratings_Issuer%20Backgrounder.pdf.
sustainalytics.com. (n.d.). ESG Risk Ratings. [online] Available at: https://www.sustainalytics.com/esg-data.
www.spglobal.com. (2023). S&P 500® - S&P Dow Jones Indices. [online] Available at: https://www.spglobal.com/spdji/en/indices/equity/sp-500/#data.
sns.pairplot(workingData) #
<seaborn.axisgrid.PairGrid at 0x1861a6aad10>